Visualising Data
Leighton Pritchard
6 September 2016
IMAGINE…
YOUR FIGURES ARE AMAZING!
BUT MISLEADING
A bar chart
Two effectors
Knocked out independently
Host chlorosis measured
Communication
Stories told through figures
Scales matter
Indication of quantities
Context matters
Figures are what you remember of a story
Figures can mislead
Binary thinking
The same or not the same
Larger or smaller
What about
uncertainty
?
Another Bar Chart
Four effectors
Bacterial effectors
Inoculate wild-type plants
Measure growth (CFU)
Four bar plots
Do the effectors have the same effect?
Add error bars
Do the effectors have the same effect?
Error bars
Error bars
Estimates of uncertainty
But uncertainty of what?
standard deviation
(
sd
):
describes the data:
how much members of the group differ from the mean
standard error (of the mean)
(
sem
):
describes the estimate of the mean:
standard deviation of the estimate of the mean
SD or SEM?
Which was used (& which would you want to know)?
Raw data
Raw data
Are they the same responses?
What does mean mean?
Does having the same mean imply having the same response?
What does mean mean?
Unequal sample sizes
What does mean mean?
Outliers
What does mean mean?
Bimodal distribution
But stats, right?
We use figures as guides…
“Figures tell a story, but we actually only believe the stats”
P<0.05
,
t
-test (NHST), a description if you’re lucky
Do the distributions support use of NHST or
t
-test (are the data Normal)?
…we trust the P-values
Bar plots hide inappropriate assumptions
Source:
Weissgerber et al. (2015)
Figures can mislead
reinforce poor practice
binary thinking
overlooking data distributions and wrong statistical assumptions for tests
overlooking uncertainty
suggest
neat
stories (
P<0.05
)
data, like life, can be
messy
Ways forward
Now what?
“Thanks for undermining me. Now what do I do about it?”
Other data representations are available
Data visualisation/statistics training courses
Research Data Visualisation Workshops
Data Carpentry
Software Carpentry
Anscombe’s Quartet
Four datasets: same means and standard deviations
Boxplots
Median, interquartiles, outliers
Raw data
1D scatterplots
Box and raw data
Boxplots and jittered 1D scatterplots
Violin plot
Data density estimate
Violin and raw data
Stacked, not jittered, data